Data are the core of everything that we do in statistical analysis. Data come in many forms, and I don’t just mean .csv, .xls, .sav, etc. Data can be wide, long, documented, fragmented, messy, and about anything else that you can imagine.
Although data could arguably be more means than end in psychology, the importance of understanding the structure and format of your data cannot overstated. Failure to understand your data could end in improper techniques and flagrantly wrong inferences at worst. This is especially important for longitudinal data. We will discuss many aspects of data handling. One thing to note is that this is just ONE WAY to do it. There are many equivalent.
Why are we thinking about data? Because 80%, maybe more, of your time spent with “analysis” is spent getting data in order and setting up your model of interest.
Aka multivariate vs stacked; person vs person period untidy vs tidy*
Long is what MLM, ggplot2 and tidyverse packages expect whereas SEM and a lot of descriptive are calculated using wide dataframes.

In this figure X, Y, and Z could represent different waves of collection. For each wave we have some value for each of the two people in the dataset. In the long format each person has each wave as a separate row. In the wide each person has their data on a single row.
We will be working with long data for the first half of the class and wide data the second. However, even during the first half we will need to switch back and forth to make sure we can calculate certain values.
The best package to go back and forth between long and wide is the tidyr package, which is part of the tidyverse. Here we will walk through some examples of the primary functions, pivot_wider and pivot_longer
For longitudinal/repeated measures data, each row is an observation. Each person will have multiple rows. You can grab some example data from the class’s github
data <- read.csv("https://raw.githubusercontent.com/josh-jackson/longitudinal-2021/master/example.csv")
example <- data %>%
select(ID, wave, group, DAN)
example
ID wave group DAN
1 6 1 PD 0.1619
2 6 2 PD 0.1677
3 6 3 PD 0.2153
4 29 1 PD 0.1749
5 29 2 PD 0.1356
6 34 1 CTRL 0.1659
7 34 2 CTRL 0.1403
8 36 1 CTRL 0.1522
9 36 2 CTRL 0.2053
10 37 1 PD 0.2194
11 37 2 PD 0.1579
12 37 3 PD 0.2586
13 48 1 PD 0.1302
14 48 2 PD 0.2703
15 48 3 PD 0.2478
16 53 1 CTRL 0.2112
17 53 2 CTRL 0.1521
18 54 1 PD 0.2205
19 54 2 PD 0.1521
20 54 3 PD 0.1920
21 58 1 PD 0.3801
22 58 2 PD 0.2148
23 58 3 PD 0.2036
24 61 1 PD 0.0818
25 61 2 PD 0.0628
26 66 1 CTRL 0.1879
27 66 2 CTRL 0.1476
28 66 3 CTRL 0.1975
29 67 1 PD 0.1759
30 67 2 PD 0.1418
31 67 3 PD 0.1834
32 67 4 PD 0.1464
33 69 1 CTRL 0.1775
34 69 2 CTRL 0.1479
35 71 1 PD 0.0902
36 71 2 PD 0.1852
37 71 3 PD 0.0733
38 74 1 CTRL 0.2094
39 74 2 CTRL 0.2185
40 74 3 CTRL 0.2015
41 75 1 CTRL 0.2396
42 75 2 CTRL 0.1872
43 76 1 PD 0.2393
44 76 2 PD 0.1656
45 76 3 PD 0.2584
46 78 1 CTRL 0.1557
47 78 2 CTRL 0.2599
48 78 3 CTRL 0.1434
49 79 1 PD 0.2288
50 79 2 PD 0.2267
51 80 1 PD 0.1893
52 80 2 PD 0.1930
53 81 1 CTRL 0.1103
54 81 2 CTRL 0.1196
55 81 3 CTRL 0.1051
56 82 1 PD 0.2272
57 82 2 PD 0.2124
58 82 3 PD 0.2547
59 82 4 PD 0.2063
60 85 1 PD 0.2566
61 85 2 PD 0.2079
62 86 1 PD 0.1778
63 86 2 PD 0.1943
64 87 1 PD 0.1417
65 87 2 PD 0.0803
66 89 1 PD 0.2306
67 89 2 PD 0.3029
68 89 3 PD 0.1503
69 91 1 PD 0.2962
70 91 2 PD 0.1438
71 91 3 PD 0.1464
72 92 1 PD 0.2347
73 92 2 PD 0.1677
74 92 3 PD 0.1046
75 93 1 CTRL 0.1657
76 93 2 CTRL 0.1549
77 93 3 CTRL 0.2212
78 94 1 PD 0.1813
79 94 2 PD 0.1235
80 94 3 PD 0.1161
81 96 1 PD 0.1249
82 96 2 PD 0.1663
83 97 1 PD 0.1575
84 97 2 PD 0.1304
85 97 3 PD 0.1579
86 97 4 PD 0.1013
87 98 1 PD 0.1980
88 98 2 PD 0.2489
89 98 3 PD 0.1549
90 98 4 PD 0.0226
91 99 1 PD 0.1383
92 99 2 PD 0.1492
93 99 3 PD 0.1687
94 99 4 PD 0.1647
95 101 1 CTRL 0.1089
96 101 2 CTRL 0.1304
97 101 3 CTRL 0.1174
98 102 1 PD 0.2487
99 102 2 PD 0.1620
100 102 3 PD 0.1879
101 102 4 PD 0.1719
102 103 1 PD 0.2010
103 103 2 PD 0.2284
104 104 1 PD 0.1803
105 104 2 PD 0.1699
106 105 1 PD 0.2600
107 105 2 PD 0.2527
108 105 3 PD 0.2004
109 106 1 CTRL 0.2083
110 106 2 CTRL 0.2291
111 106 3 CTRL 0.2996
112 110 1 PD 0.2003
113 110 2 PD 0.2601
114 110 3 PD 0.2064
115 112 1 PD 0.1480
116 112 2 PD 0.1692
117 112 3 PD 0.1173
118 114 1 CTRL 0.1837
119 114 2 CTRL 0.2439
120 114 3 CTRL 0.2347
121 115 1 PD 0.3499
122 115 2 PD 0.1403
123 115 3 PD 0.1079
124 116 1 PD 0.2918
125 116 2 PD 0.2381
126 116 3 PD 0.1116
127 120 1 CTRL 0.2246
128 120 2 CTRL 0.3014
129 122 1 CTRL 0.1899
130 122 2 CTRL 0.2336
131 122 3 CTRL 0.2829
132 125 1 PD 0.1919
133 125 2 PD 0.2513
134 125 3 PD 0.2356
135 127 1 PD 0.1732
136 127 2 PD 0.1581
137 127 3 PD 0.1552
138 129 1 PD 0.1715
139 129 2 PD 0.1550
140 135 1 PD 0.2354
141 135 2 PD 0.2128
142 135 3 PD 0.1605
143 136 1 PD 0.2952
144 136 2 PD 0.3557
145 136 3 PD 0.3414
146 137 1 PD 0.2623
147 137 2 PD 0.2815
148 140 1 PD 0.2156
149 140 2 PD 0.1495
150 141 1 CTRL 0.2814
151 141 2 CTRL 0.1975
152 142 1 CTRL 0.2492
153 142 2 CTRL 0.2348
154 143 1 PD 0.3227
155 143 2 PD 0.2401
156 144 1 CTRL 0.2654
157 144 2 CTRL 0.1684
158 146 1 PD 0.2406
159 146 2 PD 0.1947
160 149 1 CTRL 0.3243
161 149 2 CTRL 0.2345
162 150 1 CTRL 0.1844
163 150 2 CTRL 0.1644
164 152 1 PD 0.2613
165 152 2 PD 0.2944
166 153 1 CTRL 0.2268
167 153 2 CTRL 0.2117
168 155 1 PD 0.1867
169 155 2 PD 0.1588
170 156 1 PD 0.2256
171 156 2 PD 0.2887
172 156 3 PD 0.2348
173 159 1 PD 0.1554
174 159 2 PD 0.0797
175 160 1 PD 0.2001
176 160 2 PD 0.2147
177 162 1 PD 0.2374
178 162 2 PD 0.2286
179 162 3 PD 0.2794
180 163 1 CTRL 0.1707
181 163 2 CTRL 0.2192
182 165 1 PD 0.2086
183 165 2 PD 0.1367
184 167 1 PD 0.3820
185 167 2 PD 0.3491
186 169 1 CTRL 0.2033
187 169 2 CTRL 0.2925
188 171 1 PD 0.2175
189 171 2 PD 0.2671
190 174 1 PD 0.2515
191 174 2 PD 0.1912
192 182 1 PD 0.1082
193 182 2 PD 0.0860
194 187 1 PD 0.0771
195 187 2 PD 0.0665
196 189 1 PD 0.1257
197 189 2 PD 0.1837
198 190 1 PD 0.1208
199 190 2 PD 0.1008
200 193 1 PD 0.2546
201 193 2 PD 0.2372
202 194 1 CTRL 0.2013
203 194 2 CTRL 0.2298
204 201 1 PD 0.1082
205 201 2 PD 0.0919
206 204 1 PD 0.1997
207 204 2 PD 0.1913
208 205 1 CTRL 0.1993
209 205 2 CTRL 0.1017
210 208 1 PD 0.0694
211 208 2 PD 0.2629
212 209 1 PD 0.1420
213 209 2 PD 0.1933
214 211 1 PD 0.1165
215 211 2 PD 0.1632
216 214 1 PD 0.1847
217 214 2 PD 0.2078
218 219 1 CTRL 0.2555
219 219 2 CTRL 0.2970
220 222 1 PD 0.2180
221 222 2 PD 0.0967
222 223 1 PD 0.2216
223 223 2 PD 0.2817
224 229 1 PD 0.1267
225 229 2 PD 0.1553
The pivot_wider() function takes two arguments: names_from which is the variable whose values will be converted to column names and values_from whose values will be cell values.
wide.ex <- example %>%
pivot_wider(names_from = wave, values_from = DAN)
wide.ex
# A tibble: 91 x 6
ID group `1` `2` `3` `4`
<int> <chr> <dbl> <dbl> <dbl> <dbl>
1 6 PD 0.162 0.168 0.215 NA
2 29 PD 0.175 0.136 NA NA
3 34 CTRL 0.166 0.140 NA NA
4 36 CTRL 0.152 0.205 NA NA
5 37 PD 0.219 0.158 0.259 NA
6 48 PD 0.130 0.270 0.248 NA
7 53 CTRL 0.211 0.152 NA NA
8 54 PD 0.220 0.152 0.192 NA
9 58 PD 0.380 0.215 0.204 NA
10 61 PD 0.0818 0.0628 NA NA
# … with 81 more rows
Going back to long:
The pivot_longer function takes three arguments: cols is a list of columns that are to be collapsed. The columns can be referenced by column number or column name. names_to is the name of the new column which will combine all column names. This is up to you to decide what the name is. values_to is the name of the new column which will combine all column values associated with each variable combination.
long.ex <- wide.ex %>%
pivot_longer(cols = '1':'4',
names_to = "wave",
values_to = "DAN")
long.ex
# A tibble: 364 x 4
ID group wave DAN
<int> <chr> <chr> <dbl>
1 6 PD 1 0.162
2 6 PD 2 0.168
3 6 PD 3 0.215
4 6 PD 4 NA
5 29 PD 1 0.175
6 29 PD 2 0.136
7 29 PD 3 NA
8 29 PD 4 NA
9 34 CTRL 1 0.166
10 34 CTRL 2 0.140
# … with 354 more rows
Many times datasets are, for a lack of a better term, messy. We will talk more about the upfront work later to make sure you dont have messy data. However, if you do have messy data there are a number of helpful functions to tidy-up your data.
One common way to represent longitudinal data is to name the variable with a wave signifier.
wide<- tribble(
~ID, ~ext_1, ~ext_2, ~ext_3,
1, 4, 4,4,
2, 6, 5,4,
3, 4,5,6
)
wide
# A tibble: 3 x 4
ID ext_1 ext_2 ext_3
<dbl> <dbl> <dbl> <dbl>
1 1 4 4 4
2 2 6 5 4
3 3 4 5 6
If we went and tried to pivot_longer we’d end up with
wide %>%
pivot_longer(cols = ext_1:ext_3, names_to = "time", values_to = "EXT")
# A tibble: 9 x 3
ID time EXT
<dbl> <chr> <dbl>
1 1 ext_1 4
2 1 ext_2 4
3 1 ext_3 4
4 2 ext_1 6
5 2 ext_2 5
6 2 ext_3 4
7 3 ext_1 4
8 3 ext_2 5
9 3 ext_3 6
The time column is now specific to ext, which is a problem if I have more than one variable that I am pivoting. But, we will end up using wave as our time variable in our model, and time will have to be numeric. So how can we go ahead and separate out the ext part?
One way is to use the separate function
long<- wide %>%
pivot_longer(cols = ext_1:ext_3,
names_to = "time",
values_to = "EXT") %>%
separate(time, into = c("variable", "time"))
long
# A tibble: 9 x 4
ID variable time EXT
<dbl> <chr> <chr> <dbl>
1 1 ext 1 4
2 1 ext 2 4
3 1 ext 3 4
4 2 ext 1 6
5 2 ext 2 5
6 2 ext 3 4
7 3 ext 1 4
8 3 ext 2 5
9 3 ext 3 6
In terms of setting up your data, it is often helpful to include markers that separate parts of the variable eg "_" or “.” A variable that is ext_1 is easier to separate than ext1.
Note, also that the time column is a character rather than numeric. We need to change this so as to use time continuously in our models. There are a few ways to do it, but this is perhaps the most straightforward.
long$time <- as.numeric(long$time)
long
# A tibble: 9 x 4
ID variable time EXT
<dbl> <chr> <dbl> <dbl>
1 1 ext 1 4
2 1 ext 2 4
3 1 ext 3 4
4 2 ext 1 6
5 2 ext 2 5
6 2 ext 3 4
7 3 ext 1 4
8 3 ext 2 5
9 3 ext 3 6
However, something that is a little more elegant is to do both the separating AND the making into numeric in the original pivot_longer function
names_prefix omits what is in there from the new cell names. Previously we had ext_1, ext_2, etc, which we had to seperate with a different function, but this does it within pivot_longer
wide %>%
pivot_longer(cols = ext_1:ext_3,
names_to = "time",
values_to = "EXT",
names_prefix = "ext_")
# A tibble: 9 x 3
ID time EXT
<dbl> <chr> <dbl>
1 1 1 4
2 1 2 4
3 1 3 4
4 2 1 6
5 2 2 5
6 2 3 4
7 3 1 4
8 3 2 5
9 3 3 6
names_transform does any transformations within the variables. Here instead of a separate call, we can make our variables numeric.
wide %>%
pivot_longer(cols = ext_1:ext_3,
names_to = "time",
values_to = "EXT",
names_prefix = "ext_",
names_transform = list(time = as.numeric))
# A tibble: 9 x 3
ID time EXT
<dbl> <dbl> <dbl>
1 1 1 4
2 1 2 4
3 1 3 4
4 2 1 6
5 2 2 5
6 2 3 4
7 3 1 4
8 3 2 5
9 3 3 6
Another common problem that we often face is the need to unite two variables into one. Enter, the creatively titled unite function. Sometimes this happens when our time metric is entered in seperate columns.
df <- tibble(
ID = c(1, 2, 3),
year = c(2020, 2020, 2020),
month = c(1, 1, 1),
day = c(1, 1, 1),
hour = c(4, 2, 5),
min = c(55, 17, 23))
df
# A tibble: 3 x 6
ID year month day hour min
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2020 1 1 4 55
2 2 2020 1 1 2 17
3 3 2020 1 1 5 23
To combine them into one time metric
df %>%
unite(col = time, 5:6, sep=":", remove =TRUE)
# A tibble: 3 x 5
ID year month day time
<dbl> <dbl> <dbl> <dbl> <chr>
1 1 2020 1 1 4:55
2 2 2020 1 1 2:17
3 3 2020 1 1 5:23
A date-time is a date plus a time: it uniquely identifies an instant in time (typically to the nearest second). These are called POSIXct in R.
today()
[1] "2021-01-26"
now()
[1] "2021-01-26 18:30:00 CST"
Bringing these into R from some outside place (excel, spss) can lead to confusion, as they can be formatted differently
ymd("2017-01-31")
[1] "2017-01-31"
mdy("January 31st, 2017")
[1] "2017-01-31"
dmy("31-Jan-2017")
[1] "2017-01-31"
You can create these relatively straight forwardly…by hand
ymd_hms("2017-01-31 20:11:59")
[1] "2017-01-31 20:11:59 UTC"
mdy_hm("01/31/2017 08:01")
[1] "2017-01-31 08:01:00 UTC"
Or you can use existing columns variables. This is where the lubridate package comes in handy
df %>%
mutate(t_1 = make_datetime(year, month, day, hour, min))
# A tibble: 3 x 7
ID year month day hour min t_1
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dttm>
1 1 2020 1 1 4 55 2020-01-01 04:55:00
2 2 2020 1 1 2 17 2020-01-01 02:17:00
3 3 2020 1 1 5 23 2020-01-01 05:23:00
Note the t_1 variable is a POSIXct variable type. Once in this format it is much easier to manipulate and work with dates and times.
As with any project, but especially for longitudinal data, one of the most important aspects of data analysis is A. not losing track of what you did and B. being organized. This is much much much harder than said. I find using a combination of 1. rstudio projects 2. git and 3. codebooks are helpful in accomplishing these two goals. We will talk about #1 and #2 but I also encourage you to read about git. These are not the only way to do these sorts of analyses but I feel that exposure to them is helpful, as often in the social sciences these sort of decisions are not discussed.
What these help to do is create a chain of processing where you start with RAW data and end up with the cleaned data. Importantly you can always start over from the raw data. This is important for people wanting to reproduce your findings and or your future self figuring out where a certain variable came from.
We start creating the chain of processing by documenting all of your code, all of it inside. To do so we will be using rmarkdown documents, as the language is easier than LaTeX and more helpful than plaintext.
When I create an rmarkdown document for my own research projects, I always start by setting up 3 components:
Below, we will step through each of these separately, setting ourselves up to (hopefully) flawlessly communicate with R and our data. Note that you do not need to use rmarkdown but I think rmarkdown is much more useful than standard .R syntax.
Packages seems like the most basic step, but it is actually very important. Depending on what gets loaded you might overwrite functions from other packages.(Note: I will often reload or not follow this advice within lectures for didactic reasons, choosing to put library calls above the code)
The second step is a codebook. Arguably, this is the first step because you should create the codebook long before you open R and load your data.
Why a codebook? Well, because you typically have a lot of variables and you will not be able to remember all the details that go into each one of them (rating scale, what the actual item was, was it coded someway, etc). This is especially true now that data are being collected online, which often provides placeholder variable names that then need to be processed somehow. This codebook will serve as a means to document RAW code. It will also allow us to automate some tasks that are somewhat cumbersome, facilitate open data practices, and efficiently see what variables are available. Ultimately, we want to be able to show how we got from the start, with the messy raw data, to our analyses and results at the end? A codebook makes this easier.
To illistrate, we are going to using some data from the German Socioeconomic Panel Study (GSOEP), which is an ongoing Panel Study in Germany. Note that these data are for teaching purposes only, shared under the license for the Comprehensive SOEP teaching dataset, which I, as a contracted SOEP user, can use for teaching purposes. These data represent select cases from the full data set and should not be used for the purpose of publication. The full data are available for free at https://www.diw.de/en/diw_02.c.222829.en/access_and_ordering.html.
For this tutorial, I created the codebook for you, and included what I believe are the core columns you may need. Some of these columns will not be particularly helpful for this dataset. For example, many of you likely work with datasets that have only a single file while others work with datasetsspread across many files (e.g., different waves, different sources). As a result, the “dataset” column of the codebook may only have a single value whereas for others it may have multiple.
Here are my core columns that are based on the original data:
dataset: this column indexes the name of the dataset that you will be pulling the data from. This is important because we will use this info later on (see purrr tutorial) to load and clean specific data files. Even if you don’t have multiple data sets, I believe consistency is more important and suggest using this.
old_name: this column is the name of the variable in the data you are pulling it from. This should be exact. The goal of this column is that it will allow us to select() variables from the original data file and rename them something that is more useful to us. If you have worked with qualtrics (really any data) you know why this is important.
item_text: this column is the original text that participants saw or a description of the item.
scale: this column tells you what the scale of the variable is. Is it a numeric variable, a text variable, etc. This is helpful for knowing the plausible range.
reverse: this column tells you whether items in a scale need to be reverse coded. I recommend coding this as 1 (leave alone) and -1 (reverse) for reasons that will become clear later.
mini: this column represents the minimum value of scales that are numeric. Leave blank otherwise.
maxi: this column represents the maximumv alue of scales that are numeric. Leave blank otherwise.
recode: sometimes, we want to recode variables for analyses (e.g. for categorical variables with many levels where sample sizes for some levels are too small to actually do anything with it). I use this column to note the kind of recoding I’ll do to a variable for transparency.
Here are additional columns that will make our lives easier or are applicable to some but not all data sets:
category: broad categories that different variables can be put into. I’m a fan of naming them things like “outcome”, “predictor”, “moderator”, “demographic”, “procedural”, etc. but sometimes use more descriptive labels like “Big 5” to indicate the model from which the measures are derived.
label: label is basically one level lower than category. So if the category is Big 5, the label would be, or example, “A” for Agreeableness, “SWB” for subjective well-being, etc. This column is most important and useful when you have multiple items in a scales, so I’ll typically leave this blank when something is a standalone variable (e.g. sex, single-item scales, etc.).
item_name: This is the lowest level and most descriptive variable. It indicates which item in scale something is. So it may be “kind” for Agreebleness or “sex” for the demographic biological sex variable.
year: for longitudinal data, we have several waves of data and the name of the same item across waves is often different, so it’s important to note to which wave an item belongs. You can do this by noting the wave (e.g. 1, 2, 3), but I prefer the actual year the data were collected (e.g. 2005, 2009, etc.) if that is appropriate. See Lecture #1 on discussion of meaningful time metrics. Note that this differs from that discussion in your codebook you want to describe how you collected the data, not necessarily how you want to analyze the data.
new_name: This is a column that brings together much of the information we’ve already collected. It’s purpose is to be the new name that we will give to the variable that is more useful and descriptive to us. This is a constructed variable that brings together others. I like to make it a combination of “category”, “label”, “item_name”, and year using varying combos of "_" and “.” that we can use later with tidyverse functions. I typically construct this variable in Excel using the CONCATENATE() function, but it could also be done in R. The reason I do it in Excel is that it makes it easier for someone who may be reviewing my codebook.
There is a separate discussion to be had on naming conventions for your variables, but the important idea to remember is that names convey important information and we want to use this information later on to make our life easier. By coding these variables using this information AND systematically using different separators we can accomplish this goal.
These are just suggestions, but after working with many longitudinal datasets I will say all of them are horrible in some way. Doing this makes them less horrible. Is it some upfront work? Yes. Will it ultimately save you time? Yes. Also, if you know this prior to runnign a study you are making some sort of code book anyways, right, right? Might as well kill two birds with one stone.
You can make the codebook anyway you want, but the two best options are miscrosoft excel and google pages. Not because they are necessarily the best functioning but because they are relatively ubiquitous and are easy to share.
We will create a codebook but then bring the codebook into R via turning it into a csv. You can rethink the codebook as a way of coding prior to putting anything into R.
Below, I’ll load in the codebook we will use for this study, which will include all of the above columns.
codebook <- read.csv("https://raw.githubusercontent.com/josh-jackson/longitudinal-2021/master/codebook.csv")
codebook <- codebook %>%
mutate(old_name = str_to_lower(old_name))
codebook
dataset old_name item_text
1 persnr Never Changing Person ID
2 hhnr household ID
3 ppfad gebjahr Year of Birth
4 ppfad sex Sex
5 vp vp12501 Thorough Worker
6 zp zp12001 Thorough Worker
7 bdp bdp15101 Thorough Worker
8 vp vp12502 Am communicative
9 zp zp12002 Am communicative
10 bdp bdp15102 Am communicative
11 vp vp12503 Am sometimes too coarse with others
12 zp zp12003 Am sometimes too coarse with others
13 bdp bdp15103 Am sometimes too coarse with others
14 vp vp12504 Am original
15 zp zp12004 Am original
16 bdp bdp15104 Am original
17 vp vp12505 Worry a lot
18 zp zp12005 Worry a lot
19 bdp bdp15105 Worry a lot
20 vp vp12506 Able to forgive
21 zp zp12006 Able to forgive
22 bdp bdp15106 Able to forgive
23 vp vp12507 Tend to be lazy
24 zp zp12007 Tend to be lazy
25 bdp bdp15107 Tend to be lazy
26 vp vp12508 Am sociable
27 zp zp12008 Am sociable
28 bdp bdp15108 Am sociable
29 vp vp12509 Value artistic experiences
30 zp zp12009 Value artistic experiences
31 bdp bdp15109 Value artistic experiences
32 vp vp12510 Somewhat nervous
33 zp zp12010 Somewhat nervous
34 bdp bdp15110 Somewhat nervous
35 vp vp12511 Carry out tasks efficiently
36 zp zp12011 Carry out tasks efficiently
37 bdp bdp15111 Carry out tasks efficiently
38 vp vp12512 Reserved
39 zp zp12012 Reserved
40 bdp bdp15112 Reserved
41 vp vp12513 Friendly with others
42 zp zp12013 Friendly with others
43 bdp bdp15113 Friendly with others
44 vp vp12514 Have a lively imagination
45 zp zp12014 Have a lively imagination
46 bdp bdp15114 Have a lively imagination
47 vp vp12515 Deal well with stress
48 zp zp12015 Deal well with stress
49 bdp bdp15115 Deal well with stress
50 vp vp15307 Child Born
51 wp wp14107 Child Born
52 xp xp14807 Child Born
53 yp yp15407 Child Born
54 zp zp15607 Child Born
55 bap bap15907 Child Born
56 bbp bbp15110 Child Born
57 bcp bcp15010 Child Born
58 bdp bdp15710 Child Born
59 bep bep15010 Child Born
60 bfp bfp17310 Child Born
61 vp vp15310 Child Moved Out
62 wp wp14110 Child Moved Out
63 xp xp14813 Child Moved Out
64 yp yp15413 Child Moved Out
65 zp zp15613 Child Moved Out
66 bap bap15913 Child Moved Out
67 bbp bbp15116 Child Moved Out
68 bcp bcp15016 Child Moved Out
69 bdp bdp15716 Child Moved Out
70 bep bep15016 Child Moved Out
71 bfp bfp17316 Child Moved Out
72 vp vp15316 Divorced
73 wp wp14116 Divorced
74 xp xp14819 Divorced
75 yp yp15419 Divorced
76 zp zp15619 Divorced
77 bap bap15919 Divorced
78 bbp bbp15122 Divorced
79 bcp bcp15022 Divorced
80 bdp bdp15722 Divorced
81 bep bep15022 Divorced
82 bfp bfp17322 Divorced
83 vp vp15322 Father Died
84 wp wp14122 Father Died
85 xp xp14825 Father Died
86 yp yp15425 Father Died
87 zp zp15625 Father Died
88 bap bap15925 Father Died
89 bbp bbp15128 Father Died
90 bcp bcp15028 Father Died
91 bdp bdp15728 Father Died
92 bep bep15028 Father Died
93 bfp bfp17328 Father Died
94 bbp bbp15101 Got Together with a New Partner
95 bcp bcp15001 Got Together with a New Partner
96 bdp bdp15701 Got Together with a New Partner
97 bep bep15001 Got Together with a New Partner
98 bfp bfp17301 Got Together with a New Partner
99 vp vp15301 Married
100 wp wp14101 Married
101 xp xp14801 Married
102 yp yp15401 Married
103 zp zp15601 Married
104 bap bap15901 Married
105 bbp bbp15104 Married
106 bcp bcp15004 Married
107 bdp bdp15704 Married
108 bep bep15004 Married
109 bfp bfp17304 Married
110 vp vp15325 Mother Died
111 wp wp14125 Mother Died
112 xp xp14828 Mother Died
113 yp yp15428 Mother Died
114 zp zp15628 Mother Died
115 bap bap15928 Mother Died
116 bbp bbp15131 Mother Died
117 bcp bcp15031 Mother Died
118 bdp bdp15731 Mother Died
119 bep bep15031 Mother Died
120 bfp bfp17331 Mother Died
121 vp vp15304 Moved In Together
122 wp wp14104 Moved In Together
123 xp xp14804 Moved In Together
124 yp yp15404 Moved In Together
125 zp zp15604 Moved In Together
126 bap bap15904 Moved In Together
127 bbp bbp15107 Moved In Together
128 bcp bcp15007 Moved In Together
129 bdp bdp15707 Moved In Together
130 bep bep15007 Moved In Together
131 bfp bfp17307 Moved In Together
132 vp vp15319 Partner Died
133 wp wp14119 Partner Died
134 xp xp14822 Partner Died
135 yp yp15422 Partner Died
136 zp zp15622 Partner Died
137 bap bap15922 Partner Died
138 bbp bbp15125 Partner Died
139 bcp bcp15025 Partner Died
140 bdp bdp15725 Partner Died
141 bep bep15025 Partner Died
142 bfp bfp17325 Partner Died
143 vp vp15313 Separated From Partner
144 wp wp14113 Separated From Partner
145 xp xp14816 Separated From Partner
146 yp yp15416 Separated From Partner
147 zp zp15616 Separated From Partner
148 bap bap15916 Separated From Partner
149 bbp bbp15119 Separated From Partner
150 bcp bcp15019 Separated From Partner
151 bdp bdp15719 Separated From Partner
152 bep bep15019 Separated From Partner
153 bfp bfp17319 Separated From Partner
scale
1
2
3 numeric
4 \n1 [1] Male\n2 [2] Female\n-1 [-1] No Answer\n-2 [-2] Does not apply\n-3 [-3] Answer improbable\n-4 [-4] Inadmissible multiple response\n-5 [-5] Not included in this version of the questionnaire\n-6 [-6] Version of questionnaire with modified filtering\n56186 57630 24 0 0 0 0 0
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
category label item_name year new_name
1 Procedural SID 0 Procedural__SID
2 Procedural household 0 Procedural__household
3 Demographic DOB 0 Demographic__DOB
4 Demographic Sex 0 Demographic__Sex
5 Big 5 C thorough 2005 Big 5__C_thorough.2005
6 Big 5 C thorough 2009 Big 5__C_thorough.2009
7 Big 5 C thorough 2013 Big 5__C_thorough.2013
8 Big 5 E communic 2005 Big 5__E_communic.2005
9 Big 5 E communic 2009 Big 5__E_communic.2009
10 Big 5 E communic 2013 Big 5__E_communic.2013
11 Big 5 A coarse 2005 Big 5__A_coarse.2005
12 Big 5 A coarse 2009 Big 5__A_coarse.2009
13 Big 5 A coarse 2013 Big 5__A_coarse.2013
14 Big 5 O original 2005 Big 5__O_original.2005
15 Big 5 O original 2009 Big 5__O_original.2009
16 Big 5 O original 2013 Big 5__O_original.2013
17 Big 5 N worry 2005 Big 5__N_worry.2005
18 Big 5 N worry 2009 Big 5__N_worry.2009
19 Big 5 N worry 2013 Big 5__N_worry.2013
20 Big 5 A forgive 2005 Big 5__A_forgive.2005
21 Big 5 A forgive 2009 Big 5__A_forgive.2009
22 Big 5 A forgive 2013 Big 5__A_forgive.2013
23 Big 5 C lazy 2005 Big 5__C_lazy.2005
24 Big 5 C lazy 2009 Big 5__C_lazy.2009
25 Big 5 C lazy 2013 Big 5__C_lazy.2013
26 Big 5 E sociable 2005 Big 5__E_sociable.2005
27 Big 5 E sociable 2009 Big 5__E_sociable.2009
28 Big 5 E sociable 2013 Big 5__E_sociable.2013
29 Big 5 O artistic 2005 Big 5__O_artistic.2005
30 Big 5 O artistic 2009 Big 5__O_artistic.2009
31 Big 5 O artistic 2013 Big 5__O_artistic.2013
32 Big 5 N nervous 2005 Big 5__N_nervous.2005
33 Big 5 N nervous 2009 Big 5__N_nervous.2009
34 Big 5 N nervous 2013 Big 5__N_nervous.2013
35 Big 5 C efficient 2005 Big 5__C_efficient.2005
36 Big 5 C efficient 2009 Big 5__C_efficient.2009
37 Big 5 C efficient 2013 Big 5__C_efficient.2013
38 Big 5 E reserved 2005 Big 5__E_reserved.2005
39 Big 5 E reserved 2009 Big 5__E_reserved.2009
40 Big 5 E reserved 2013 Big 5__E_reserved.2013
41 Big 5 A friendly 2005 Big 5__A_friendly.2005
42 Big 5 A friendly 2009 Big 5__A_friendly.2009
43 Big 5 A friendly 2013 Big 5__A_friendly.2013
44 Big 5 O imagin 2005 Big 5__O_imagin.2005
45 Big 5 O imagin 2009 Big 5__O_imagin.2009
46 Big 5 O imagin 2013 Big 5__O_imagin.2013
47 Big 5 N dealStress 2005 Big 5__N_dealStress.2005
48 Big 5 N dealStress 2009 Big 5__N_dealStress.2009
49 Big 5 N dealStress 2013 Big 5__N_dealStress.2013
50 Life Event ChldBrth 2005 Life Event__ChldBrth.2005
51 Life Event ChldBrth 2006 Life Event__ChldBrth.2006
52 Life Event ChldBrth 2007 Life Event__ChldBrth.2007
53 Life Event ChldBrth 2008 Life Event__ChldBrth.2008
54 Life Event ChldBrth 2009 Life Event__ChldBrth.2009
55 Life Event ChldBrth 2010 Life Event__ChldBrth.2010
56 Life Event ChldBrth 2011 Life Event__ChldBrth.2011
57 Life Event ChldBrth 2012 Life Event__ChldBrth.2012
58 Life Event ChldBrth 2013 Life Event__ChldBrth.2013
59 Life Event ChldBrth 2014 Life Event__ChldBrth.2014
60 Life Event ChldBrth 2015 Life Event__ChldBrth.2015
61 Life Event ChldMvOut 2005 Life Event__ChldMvOut.2005
62 Life Event ChldMvOut 2006 Life Event__ChldMvOut.2006
63 Life Event ChldMvOut 2007 Life Event__ChldMvOut.2007
64 Life Event ChldMvOut 2008 Life Event__ChldMvOut.2008
65 Life Event ChldMvOut 2009 Life Event__ChldMvOut.2009
66 Life Event ChldMvOut 2010 Life Event__ChldMvOut.2010
67 Life Event ChldMvOut 2011 Life Event__ChldMvOut.2011
68 Life Event ChldMvOut 2012 Life Event__ChldMvOut.2012
69 Life Event ChldMvOut 2013 Life Event__ChldMvOut.2013
70 Life Event ChldMvOut 2014 Life Event__ChldMvOut.2014
71 Life Event ChldMvOut 2015 Life Event__ChldMvOut.2015
72 Life Event Divorce 2005 Life Event__Divorce.2005
73 Life Event Divorce 2006 Life Event__Divorce.2006
74 Life Event Divorce 2007 Life Event__Divorce.2007
75 Life Event Divorce 2008 Life Event__Divorce.2008
76 Life Event Divorce 2009 Life Event__Divorce.2009
77 Life Event Divorce 2010 Life Event__Divorce.2010
78 Life Event Divorce 2011 Life Event__Divorce.2011
79 Life Event Divorce 2012 Life Event__Divorce.2012
80 Life Event Divorce 2013 Life Event__Divorce.2013
81 Life Event Divorce 2014 Life Event__Divorce.2014
82 Life Event Divorce 2015 Life Event__Divorce.2015
83 Life Event DadDied 2005 Life Event__DadDied.2005
84 Life Event DadDied 2006 Life Event__DadDied.2006
85 Life Event DadDied 2007 Life Event__DadDied.2007
86 Life Event DadDied 2008 Life Event__DadDied.2008
87 Life Event DadDied 2009 Life Event__DadDied.2009
88 Life Event DadDied 2010 Life Event__DadDied.2010
89 Life Event DadDied 2011 Life Event__DadDied.2011
90 Life Event DadDied 2012 Life Event__DadDied.2012
91 Life Event DadDied 2013 Life Event__DadDied.2013
92 Life Event DadDied 2014 Life Event__DadDied.2014
93 Life Event DadDied 2015 Life Event__DadDied.2015
94 Life Event NewPart 2011 Life Event__NewPart.2011
95 Life Event NewPart 2012 Life Event__NewPart.2012
96 Life Event NewPart 2013 Life Event__NewPart.2013
97 Life Event NewPart 2014 Life Event__NewPart.2014
98 Life Event NewPart 2015 Life Event__NewPart.2015
99 Life Event Married 2005 Life Event__Married.2005
100 Life Event Married 2006 Life Event__Married.2006
101 Life Event Married 2007 Life Event__Married.2007
102 Life Event Married 2008 Life Event__Married.2008
103 Life Event Married 2009 Life Event__Married.2009
104 Life Event Married 2010 Life Event__Married.2010
105 Life Event Married 2011 Life Event__Married.2011
106 Life Event Married 2012 Life Event__Married.2012
107 Life Event Married 2013 Life Event__Married.2013
108 Life Event Married 2014 Life Event__Married.2014
109 Life Event Married 2015 Life Event__Married.2015
110 Life Event MomDied 2005 Life Event__MomDied.2005
111 Life Event MomDied 2006 Life Event__MomDied.2006
112 Life Event MomDied 2007 Life Event__MomDied.2007
113 Life Event MomDied 2008 Life Event__MomDied.2008
114 Life Event MomDied 2009 Life Event__MomDied.2009
115 Life Event MomDied 2010 Life Event__MomDied.2010
116 Life Event MomDied 2011 Life Event__MomDied.2011
117 Life Event MomDied 2012 Life Event__MomDied.2012
118 Life Event MomDied 2013 Life Event__MomDied.2013
119 Life Event MomDied 2014 Life Event__MomDied.2014
120 Life Event MomDied 2015 Life Event__MomDied.2015
121 Life Event MoveIn 2005 Life Event__MoveIn.2005
122 Life Event MoveIn 2006 Life Event__MoveIn.2006
123 Life Event MoveIn 2007 Life Event__MoveIn.2007
124 Life Event MoveIn 2008 Life Event__MoveIn.2008
125 Life Event MoveIn 2009 Life Event__MoveIn.2009
126 Life Event MoveIn 2010 Life Event__MoveIn.2010
127 Life Event MoveIn 2011 Life Event__MoveIn.2011
128 Life Event MoveIn 2012 Life Event__MoveIn.2012
129 Life Event MoveIn 2013 Life Event__MoveIn.2013
130 Life Event MoveIn 2014 Life Event__MoveIn.2014
131 Life Event MoveIn 2015 Life Event__MoveIn.2015
132 Life Event PartDied 2005 Life Event__PartDied.2005
133 Life Event PartDied 2006 Life Event__PartDied.2006
134 Life Event PartDied 2007 Life Event__PartDied.2007
135 Life Event PartDied 2008 Life Event__PartDied.2008
136 Life Event PartDied 2009 Life Event__PartDied.2009
137 Life Event PartDied 2010 Life Event__PartDied.2010
138 Life Event PartDied 2011 Life Event__PartDied.2011
139 Life Event PartDied 2012 Life Event__PartDied.2012
140 Life Event PartDied 2013 Life Event__PartDied.2013
141 Life Event PartDied 2014 Life Event__PartDied.2014
142 Life Event PartDied 2015 Life Event__PartDied.2015
143 Life Event SepPart 2005 Life Event__SepPart.2005
144 Life Event SepPart 2006 Life Event__SepPart.2006
145 Life Event SepPart 2007 Life Event__SepPart.2007
146 Life Event SepPart 2008 Life Event__SepPart.2008
147 Life Event SepPart 2009 Life Event__SepPart.2009
148 Life Event SepPart 2010 Life Event__SepPart.2010
149 Life Event SepPart 2011 Life Event__SepPart.2011
150 Life Event SepPart 2012 Life Event__SepPart.2012
151 Life Event SepPart 2013 Life Event__SepPart.2013
152 Life Event SepPart 2014 Life Event__SepPart.2014
153 Life Event SepPart 2015 Life Event__SepPart.2015
reverse mini maxi recode
1 1 NA NA
2 1 NA NA
3 1 NA NA
4 1 NA NA
5 1 1 7
6 1 1 7
7 1 1 7
8 1 1 7
9 1 1 7
10 1 1 7
11 -1 1 7
12 -1 1 7
13 -1 1 7
14 1 1 7
15 1 1 7
16 1 1 7
17 -1 1 7
18 -1 1 7
19 -1 1 7
20 1 1 7
21 1 1 7
22 1 1 7
23 -1 1 7
24 -1 1 7
25 -1 1 7
26 1 1 7
27 1 1 7
28 1 1 7
29 1 1 7
30 1 1 7
31 1 1 7
32 -1 1 7
33 -1 1 7
34 -1 1 7
35 1 1 7
36 1 1 7
37 1 1 7
38 -1 1 7
39 -1 1 7
40 -1 1 7
41 1 1 7
42 1 1 7
43 1 1 7
44 1 1 7
45 1 1 7
46 1 1 7
47 1 1 7
48 1 1 7
49 1 1 7
50 1 NA NA 1 = experienced; 0 = did not experience
51 1 NA NA 1 = experienced; 0 = did not experience
52 1 NA NA 1 = experienced; 0 = did not experience
53 1 NA NA 1 = experienced; 0 = did not experience
54 1 NA NA 1 = experienced; 0 = did not experience
55 1 NA NA 1 = experienced; 0 = did not experience
56 1 NA NA 1 = experienced; 0 = did not experience
57 1 NA NA 1 = experienced; 0 = did not experience
58 1 NA NA 1 = experienced; 0 = did not experience
59 1 NA NA 1 = experienced; 0 = did not experience
60 1 NA NA 1 = experienced; 0 = did not experience
61 1 NA NA 1 = experienced; 0 = did not experience
62 1 NA NA 1 = experienced; 0 = did not experience
63 1 NA NA 1 = experienced; 0 = did not experience
64 1 NA NA 1 = experienced; 0 = did not experience
65 1 NA NA 1 = experienced; 0 = did not experience
66 1 NA NA 1 = experienced; 0 = did not experience
67 1 NA NA 1 = experienced; 0 = did not experience
68 1 NA NA 1 = experienced; 0 = did not experience
69 1 NA NA 1 = experienced; 0 = did not experience
70 1 NA NA 1 = experienced; 0 = did not experience
71 1 NA NA 1 = experienced; 0 = did not experience
72 1 NA NA 1 = experienced; 0 = did not experience
73 1 NA NA 1 = experienced; 0 = did not experience
74 1 NA NA 1 = experienced; 0 = did not experience
75 1 NA NA 1 = experienced; 0 = did not experience
76 1 NA NA 1 = experienced; 0 = did not experience
77 1 NA NA 1 = experienced; 0 = did not experience
78 1 NA NA 1 = experienced; 0 = did not experience
79 1 NA NA 1 = experienced; 0 = did not experience
80 1 NA NA 1 = experienced; 0 = did not experience
81 1 NA NA 1 = experienced; 0 = did not experience
82 1 NA NA 1 = experienced; 0 = did not experience
83 1 NA NA 1 = experienced; 0 = did not experience
84 1 NA NA 1 = experienced; 0 = did not experience
85 1 NA NA 1 = experienced; 0 = did not experience
86 1 NA NA 1 = experienced; 0 = did not experience
87 1 NA NA 1 = experienced; 0 = did not experience
88 1 NA NA 1 = experienced; 0 = did not experience
89 1 NA NA 1 = experienced; 0 = did not experience
90 1 NA NA 1 = experienced; 0 = did not experience
91 1 NA NA 1 = experienced; 0 = did not experience
92 1 NA NA 1 = experienced; 0 = did not experience
93 1 NA NA 1 = experienced; 0 = did not experience
94 1 NA NA 1 = experienced; 0 = did not experience
95 1 NA NA 1 = experienced; 0 = did not experience
96 1 NA NA 1 = experienced; 0 = did not experience
97 1 NA NA 1 = experienced; 0 = did not experience
98 1 NA NA 1 = experienced; 0 = did not experience
99 1 NA NA 1 = experienced; 0 = did not experience
100 1 NA NA 1 = experienced; 0 = did not experience
101 1 NA NA 1 = experienced; 0 = did not experience
102 1 NA NA 1 = experienced; 0 = did not experience
103 1 NA NA 1 = experienced; 0 = did not experience
104 1 NA NA 1 = experienced; 0 = did not experience
105 1 NA NA 1 = experienced; 0 = did not experience
106 1 NA NA 1 = experienced; 0 = did not experience
107 1 NA NA 1 = experienced; 0 = did not experience
108 1 NA NA 1 = experienced; 0 = did not experience
109 1 NA NA 1 = experienced; 0 = did not experience
110 1 NA NA 1 = experienced; 0 = did not experience
111 1 NA NA 1 = experienced; 0 = did not experience
112 1 NA NA 1 = experienced; 0 = did not experience
113 1 NA NA 1 = experienced; 0 = did not experience
114 1 NA NA 1 = experienced; 0 = did not experience
115 1 NA NA 1 = experienced; 0 = did not experience
116 1 NA NA 1 = experienced; 0 = did not experience
117 1 NA NA 1 = experienced; 0 = did not experience
118 1 NA NA 1 = experienced; 0 = did not experience
119 1 NA NA 1 = experienced; 0 = did not experience
120 1 NA NA 1 = experienced; 0 = did not experience
121 1 NA NA 1 = experienced; 0 = did not experience
122 1 NA NA 1 = experienced; 0 = did not experience
123 1 NA NA 1 = experienced; 0 = did not experience
124 1 NA NA 1 = experienced; 0 = did not experience
125 1 NA NA 1 = experienced; 0 = did not experience
126 1 NA NA 1 = experienced; 0 = did not experience
127 1 NA NA 1 = experienced; 0 = did not experience
128 1 NA NA 1 = experienced; 0 = did not experience
129 1 NA NA 1 = experienced; 0 = did not experience
130 1 NA NA 1 = experienced; 0 = did not experience
131 1 NA NA 1 = experienced; 0 = did not experience
132 1 NA NA 1 = experienced; 0 = did not experience
133 1 NA NA 1 = experienced; 0 = did not experience
134 1 NA NA 1 = experienced; 0 = did not experience
135 1 NA NA 1 = experienced; 0 = did not experience
136 1 NA NA 1 = experienced; 0 = did not experience
137 1 NA NA 1 = experienced; 0 = did not experience
138 1 NA NA 1 = experienced; 0 = did not experience
139 1 NA NA 1 = experienced; 0 = did not experience
140 1 NA NA 1 = experienced; 0 = did not experience
141 1 NA NA 1 = experienced; 0 = did not experience
142 1 NA NA 1 = experienced; 0 = did not experience
143 1 NA NA 1 = experienced; 0 = did not experience
144 1 NA NA 1 = experienced; 0 = did not experience
145 1 NA NA 1 = experienced; 0 = did not experience
146 1 NA NA 1 = experienced; 0 = did not experience
147 1 NA NA 1 = experienced; 0 = did not experience
148 1 NA NA 1 = experienced; 0 = did not experience
149 1 NA NA 1 = experienced; 0 = did not experience
150 1 NA NA 1 = experienced; 0 = did not experience
151 1 NA NA 1 = experienced; 0 = did not experience
152 1 NA NA 1 = experienced; 0 = did not experience
153 1 NA NA 1 = experienced; 0 = did not experience
First, we need to load in the data. We’re going to use three waves of data from the German Socioeconomic Panel Study, which is a longitudinal study of German households that has been conducted since 1984. We’re going to use more recent data from three waves of personality data collected between 2005 and 2013.
Note: we will be using the teaching set of the GSOEP data set. I will not be pulling from the raw files as a result of this. I will also not be mirroring the format that you would usually load the GSOEP from because that is slightly more complicated and somethng we will return to in a later tutorial after we have more skills. I’ve left that code for now, but it won’t make a lot of sense right now.
This code below shows how I would read in and rename a wide-format data set using the codebook I created.
old.names <- codebook$old_name # get old column names
new.names <- codebook$new_name # get new column names
soep <- read.csv("https://raw.githubusercontent.com/josh-jackson/longitudinal-2021/master/soepdata.csv")
soep <- soep %>% # read in data
dplyr::select(old.names) %>% # select the columns from our codebook
setNames(new.names) # rename columns with our new names
paged_table(soep)